A novel approach to estimating heterozygosity from low-coverage genome sequence An Investigation Submitted to Genetics

نویسندگان

  • Katarzyna Bryc
  • Nick Patterson
  • David Reich
چکیده

High-throughput shotgun sequence data makes it possible in principle to accurately estimate population genetic parameters without confounding by SNP ascertainment bias. One such statistic of interest is the proportion of heterozygous sites within an individual’s genome, which is informative about inbreeding and effective population size. However, in many cases, the available sequence data of an individual is limited to low coverage, preventing the confident calling of genotypes necessary to directly count the proportion of heterozygous sites. Here, we present a method for estimating an individual’s genome-wide rate of heterozygosity from low-coverage sequence data, without an intermediate step that calls genotypes. Our method jointly learns the shared allele distribution between the individual and a panel of other individuals, together with the sequencing error distributions and the reference bias. We show our method works well, first by its performance on simulated sequence data, and secondly on real sequence data where we obtain estimates using low coverage data consistent with those from higher coverage. We apply our method to obtain estimates of the rate of heterozygosity for 11 humans from diverse world-wide populations, and through this analysis reveal the complex dependency of local sequencing coverage on the true underlying heterozygosity, which complicates the estimation of heterozygosity from sequence data. We show how we can use filters to correct for the confounding arising from sequencing depth. We find in practice that ratios of heterozygosity are more interpretable than absolute estimates, and show that we obtain excellent conformity of ratios of heterozygosity with previous estimates from higher coverage data.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A novel approach to estimating heterozygosity from low-coverage genome sequence.

High-throughput shotgun sequence data make it possible in principle to accurately estimate population genetic parameters without confounding by SNP ascertainment bias. One such statistic of interest is the proportion of heterozygous sites within an individual's genome, which is informative about inbreeding and effective population size. However, in many cases, the available sequence data of an ...

متن کامل

Estimating heterozygosity from a low-coverage genome sequence, leveraging data from other individuals sequenced at the same sites

High-throughput shotgun sequence data makes it possible in principle to accurately estimate population genetic parameters without confounding by SNP ascertainment bias. One such statistic of interest is the proportion of heterozygous sites within an individual’s genome, which is informative about inbreeding and effective population size. However, in many cases, the available sequence data of an...

متن کامل

Estimating microsatellite based genetic diversity in Rhode Island Red chicken

This study aimed to estimate microsatellite based genetic diversity in two lines (the selected RIRS and control line RIRC) of Rhode Island Red (RIR) chicken. Genomic DNA of 24 randomly selected birds maintained at Central Avian Research Institute (India) and 24 microsatellite markers were used. Microsatellite alleles were determined on 6% urea-PAGE, recorded using GelDoc system and the samples ...

متن کامل

Gamma reactivation using the spongy effect of KLF1-binding site sequence: an approach in gene therapy for beta-thalassemia

Objective(s): β-thalassemia is one of the most common genetic disorders in the world. As one of the promising treatment strategies, fetal hemoglobin (Hb F) can be induced. The present study was an attempt to reactivate the γ-globin gene by introducing a gene construct containing KLF1 binding sites to the K562 cell line. Materials and Methods: A plasmid containing a 192 bp sequence with two repe...

متن کامل

Inferring Heterozygosity from Ancient and Low Coverage Genomes

While genetic diversity can be quantified accurately from high coverage sequencing data, it is often desirable to obtain such estimates from data with low coverage, either to save costs or because of low DNA quality, as is observed for ancient samples. Here, we introduce a method to accurately infer heterozygosity probabilistically from sequences with average coverage [Formula: see text] of a s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013